Member of our team:
| Fullname | Student ID |
|---|---|
| Đoàn Ánh Dương | 20127474 |
Necessary file imported here.
Note: Some external library out of min_ds-env need to be installed:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from datetime import datetime, timedelta
from dateutil import relativedelta
import os
import glob
Environment using within this final project.
import sys
sys.executable
'/home/duong/anaconda3/bin/python'
Set options here.
pd.set_option('display.max_colwidth', None)
# pd.set_option('display.float_format', '{:.2f}'.format)
For recent years, we have been heard at least once about crypto, particuarly Bitcoin and its application. Although digital currency brought many controversial opinions up to now, we cannot realize its success and popularity.
So I decided to deep in crypto market and fortunately, I found the historical price data of crypto currency on Kaggle. Although the data does not contain all information about a specific coin, the historical price can bring us to take an overview of digital currency evolution.
Throughout this project, I hope that we can have a perspective objectively and a basic knowledge about digital currency.
In this project, we will find out the crypto currency market and its history, with expectation that we can find a method to buy a coin and take profit.
I found the datasets used in this project was public on Kaggle: Crypto Currrency price history.
The data is taken from coinmarketcap and it is free to use the data.
More about the license I found on kaggle:
The dataset has one csv file for each currency. Price history is available on a daily basis from *April 28, 2013*. This dataset has the historical price information of some of the top crypto currencies by market capitalization, including Bitcoin, Etherium, Binance Coin,...
All information of columns contain in the file named description.txt.
Datasets were collected by Sudalai Rajkumar(registered account name SRK) on Kaggle.
As the author's answer, there is not mentioned clearly how the way he collected data, maybe collected by scraping historical data web. For instance: bitcoin.
Besides the historical price of digital currency, there are many factors in a coin we need to take consideration on. For example: Holders Statistic over time, legality, team and organization investors, who created the coin...
So that the dataset is subjective.
First, we cannot access each files to analyze, it must be combined into one.
path = 'archive/'
os.chdir(path)
ext = 'csv'
all_files = [f for f in glob.glob(f'*.{ext}')]
print(f'Containing number of files: {len(all_files)}.')
print(all_files)
os.chdir('../')
Containing number of files: 23. ['coin_XRP.csv', 'coin_Ethereum.csv', 'coin_Aave.csv', 'coin_USDCoin.csv', 'coin_Solana.csv', 'coin_EOS.csv', 'coin_WrappedBitcoin.csv', 'coin_Cardano.csv', 'coin_NEM.csv', 'coin_Tron.csv', 'coin_CryptocomCoin.csv', 'coin_BinanceCoin.csv', 'coin_Iota.csv', 'coin_ChainLink.csv', 'coin_Monero.csv', 'coin_Dogecoin.csv', 'coin_Tether.csv', 'coin_Bitcoin.csv', 'coin_Cosmos.csv', 'coin_Litecoin.csv', 'coin_Stellar.csv', 'coin_Polkadot.csv', 'coin_Uniswap.csv']
Lets check duplicated rows.
for f in all_files:
temp = pd.read_csv(path+f)
print(f'duplicated in {f}:', temp.index.duplicated().sum())
duplicated in coin_XRP.csv: 0 duplicated in coin_Ethereum.csv: 0 duplicated in coin_Aave.csv: 0 duplicated in coin_USDCoin.csv: 0 duplicated in coin_Solana.csv: 0 duplicated in coin_EOS.csv: 0 duplicated in coin_WrappedBitcoin.csv: 0 duplicated in coin_Cardano.csv: 0 duplicated in coin_NEM.csv: 0 duplicated in coin_Tron.csv: 0 duplicated in coin_CryptocomCoin.csv: 0 duplicated in coin_BinanceCoin.csv: 0 duplicated in coin_Iota.csv: 0 duplicated in coin_ChainLink.csv: 0 duplicated in coin_Monero.csv: 0 duplicated in coin_Dogecoin.csv: 0 duplicated in coin_Tether.csv: 0 duplicated in coin_Bitcoin.csv: 0 duplicated in coin_Cosmos.csv: 0 duplicated in coin_Litecoin.csv: 0 duplicated in coin_Stellar.csv: 0 duplicated in coin_Polkadot.csv: 0 duplicated in coin_Uniswap.csv: 0
So there is no duplicated rows. We carry out combining files.
combined_csv = pd.concat([pd.read_csv(path + fname) for fname in all_files])
print('duplicated in combine files: ', combined_csv.index.duplicated().sum())
duplicated in combine files: 34091
Why the combined files take duplicated? Because column 'SNo' is the index starting with $1$ for each files. So we must reset the index in combined files.
combined_csv.drop(columns='SNo',inplace = True)
combined_csv.to_csv('crypto_finance.csv', index=False)
combined_csv = pd.read_csv('crypto_finance.csv')
combined_csv.index.rename('SNo', inplace = True)
combined_csv.to_csv('crypto_finance.csv')
Read new file. Let's see some values.
cryp_df = pd.read_csv('crypto_finance.csv', index_col='SNo')
cryp_df.sample(10)
| Name | Symbol | Date | High | Low | Open | Close | Volume | Marketcap | |
|---|---|---|---|---|---|---|---|---|---|
| SNo | |||||||||
| 18232 | Chainlink | LINK | 2018-03-21 23:59:59 | 0.455803 | 0.390060 | 0.392459 | 0.445421 | 1.282360e+07 | 1.558974e+08 |
| 31198 | Litecoin | LTC | 2013-12-31 23:59:59 | 24.603800 | 23.709999 | 24.505100 | 24.347000 | 7.446870e+06 | 5.952194e+08 |
| 22268 | Dogecoin | DOGE | 2014-08-03 23:59:59 | 0.000210 | 0.000203 | 0.000209 | 0.000203 | 1.648620e+05 | 1.803561e+07 |
| 31425 | Litecoin | LTC | 2014-08-15 23:59:59 | 5.212860 | 4.967520 | 5.059370 | 5.048590 | 1.986680e+06 | 1.573628e+08 |
| 35994 | Stellar | XLM | 2020-03-18 23:59:59 | 0.037372 | 0.034913 | 0.036742 | 0.035979 | 3.370208e+08 | 7.291712e+08 |
| 24076 | Dogecoin | DOGE | 2019-07-16 23:59:59 | 0.003125 | 0.002802 | 0.003093 | 0.002825 | 4.098083e+07 | 3.400399e+08 |
| 25391 | Tether | USDT | 2016-10-16 23:59:59 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 2.921960e+05 | 6.951591e+06 |
| 24841 | Tether | USDT | 2015-04-15 23:59:59 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 4.070020e+04 | 2.516000e+05 |
| 22169 | Dogecoin | DOGE | 2014-04-26 23:59:59 | 0.000484 | 0.000461 | 0.000476 | 0.000472 | 9.989290e+05 | 3.509400e+07 |
| 18123 | Chainlink | LINK | 2017-12-02 23:59:59 | 0.222175 | 0.167361 | 0.177941 | 0.207898 | 7.396790e+06 | 7.276430e+07 |
More information about dataset.
cryp_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 37082 entries, 0 to 37081 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 37082 non-null object 1 Symbol 37082 non-null object 2 Date 37082 non-null object 3 High 37082 non-null float64 4 Low 37082 non-null float64 5 Open 37082 non-null float64 6 Close 37082 non-null float64 7 Volume 37082 non-null float64 8 Marketcap 37082 non-null float64 dtypes: float64(6), object(3) memory usage: 2.8+ MB
cryp_df.shape
(37082, 9)
There are 37082 rows and 9 columns.
The dataset shows that each samples contain the information of a specific coin at each time.
cryp_df.index.duplicated().sum()
0
There is no duplicated samples.
f = open('description.txt', 'r')
print(f.read())
f.close()
1) SNo: Serial Number. 2) Name: Name of coin. 3) Symbol: Symbol of coin. 4) Date: Date of observation. 5) Open: Opening price on the given day. 6) High: Highest price on the given day. 7) Low: Lowest price on the given day. 8) Close: Closing price on the given day. 9) Volume: Volume of transactions on the given day (in USD). 10) Market Cap: Market capitalization in USD.
cryp_df.dtypes
Name object Symbol object Date object High float64 Low float64 Open float64 Close float64 Volume float64 Marketcap float64 dtype: object
Overall, all the column types is correct except Date. We need to change Date column to appropriate type (datetime).
cryp_df['Date'] = pd.to_datetime(cryp_df['Date'], format="%Y-%m-%d %X")
nume_cols_df = cryp_df.describe(include=['number', 'datetime'], datetime_is_numeric=True)
nume_cols_df.loc['missing_ratio'] = cryp_df.agg(lambda x: x.isna().mean()*100)
nume_cols_df
| Date | High | Low | Open | Close | Volume | Marketcap | |
|---|---|---|---|---|---|---|---|
| count | 37082 | 37082.000000 | 37082.000000 | 37082.000000 | 37082.000000 | 3.708200e+04 | 3.708200e+04 |
| mean | 2018-08-16 07:12:30.221616896 | 1016.058015 | 952.987707 | 985.323755 | 987.120511 | 3.022542e+09 | 1.542943e+10 |
| min | 2013-04-29 23:59:59 | 0.000089 | 0.000079 | 0.000086 | 0.000086 | 0.000000e+00 | 0.000000e+00 |
| 25% | 2017-03-05 23:59:59 | 0.075664 | 0.069536 | 0.072456 | 0.072648 | 4.937190e+06 | 2.395955e+08 |
| 50% | 2019-01-09 23:59:59 | 1.008733 | 0.999850 | 1.001157 | 1.001138 | 8.512805e+07 | 1.405335e+09 |
| 75% | 2020-05-13 23:59:59 | 31.916399 | 28.996246 | 30.459673 | 30.512205 | 9.388489e+08 | 5.159305e+09 |
| max | 2021-07-06 23:59:59 | 64863.098908 | 62208.964366 | 63523.754869 | 63503.457930 | 3.509679e+11 | 1.186364e+12 |
| std | NaN | 5249.503670 | 4907.932082 | 5088.101367 | 5093.703878 | 1.190963e+10 | 7.059128e+10 |
| missing_ratio | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 |
There is no missing value in DataFrame. But it exists abnormal values in min row of Volume and Marketcap col.
So, lets take a look at these columns.
check_df = cryp_df[(cryp_df['Marketcap'] == 0) | (cryp_df['Volume'] == 0)]
check_df
| Name | Symbol | Date | High | Low | Open | Close | Volume | Marketcap | |
|---|---|---|---|---|---|---|---|---|---|
| SNo | |||||||||
| 0 | XRP | XRP | 2013-08-05 23:59:59 | 0.005980 | 0.005613 | 0.005875 | 0.005613 | 0.000000e+00 | 4.387916e+07 |
| 1 | XRP | XRP | 2013-08-06 23:59:59 | 0.005661 | 0.004629 | 0.005637 | 0.004680 | 0.000000e+00 | 3.659101e+07 |
| 2 | XRP | XRP | 2013-08-07 23:59:59 | 0.004682 | 0.004333 | 0.004669 | 0.004417 | 0.000000e+00 | 3.453412e+07 |
| 3 | XRP | XRP | 2013-08-08 23:59:59 | 0.004424 | 0.004175 | 0.004397 | 0.004254 | 0.000000e+00 | 3.325863e+07 |
| 4 | XRP | XRP | 2013-08-09 23:59:59 | 0.004367 | 0.004253 | 0.004257 | 0.004291 | 0.000000e+00 | 3.354750e+07 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36477 | Polkadot | DOT | 2020-08-28 23:59:59 | 6.333746 | 5.540963 | 5.639486 | 6.159955 | 7.271622e+08 | 0.000000e+00 |
| 36478 | Polkadot | DOT | 2020-08-29 23:59:59 | 6.562906 | 6.042309 | 6.175925 | 6.159143 | 5.272900e+08 | 0.000000e+00 |
| 36479 | Polkadot | DOT | 2020-08-30 23:59:59 | 6.219506 | 5.749978 | 6.153440 | 5.869881 | 4.853351e+08 | 0.000000e+00 |
| 36480 | Polkadot | DOT | 2020-08-31 23:59:59 | 6.459377 | 5.772966 | 5.905918 | 6.300020 | 5.126048e+08 | 0.000000e+00 |
| 36481 | Polkadot | DOT | 2020-09-01 23:59:59 | 6.838898 | 6.172324 | 6.298810 | 6.288767 | 6.015274e+08 | 0.000000e+00 |
971 rows × 9 columns
We got some problems here: Abnormal values on Volume and Marketcap col. We can calculate the ratio of abnormal values to decide whether to drop or not.
abn_df = check_df.groupby(['Symbol']).size()
all_df = cryp_df.groupby(['Symbol']).size()
(abn_df/all_df *100).dropna()
Symbol AAVE 0.363636 ATOM 5.443787 BTC 8.090939 CRO 0.213904 DOGE 0.398551 DOT 3.750000 EOS 0.068213 LTC 8.090939 SOL 11.504425 TRX 1.005747 USDC 0.798403 WBTC 22.072072 XRP 4.977532 dtype: float64
As the results, the number of invalid values is quite small, we can delete samples containing invalid values. But once we dropped it, we will face the intermittence of time data. So I decided not to drop.
cate_cols = cryp_df.describe(include=[object]).columns
cate_cols_df = cryp_df[cate_cols].agg([
lambda x: x.isna().mean() *100,
pd.Series.nunique,
pd.Series.unique])
cate_cols_df.rename(index ={
'<lambda>': 'missing_ratio',
'nunique':'num_dif_val',
'unique':'diff_vals'
})
| Name | Symbol | |
|---|---|---|
| missing_ratio | 0.0 | 0.0 |
| num_dif_val | 23 | 23 |
| diff_vals | [XRP, Ethereum, Aave, USD Coin, Solana, EOS, Wrapped Bitcoin, Cardano, NEM, TRON, Crypto.com Coin, Binance Coin, IOTA, Chainlink, Monero, Dogecoin, Tether, Bitcoin, Cosmos, Litecoin, Stellar, Polkadot, Uniswap] | [XRP, ETH, AAVE, USDC, SOL, EOS, WBTC, ADA, XEM, TRX, CRO, BNB, MIOTA, LINK, XMR, DOGE, USDT, BTC, ATOM, LTC, XLM, DOT, UNI] |
There is no missing value in categorical columns. Everything seems to be fine.
Before investing any markets, investors must know the basis to save their money. One of the most crucial knowledge is stable/unstable coins:
We can find the answer by calculating the volatility.
Now we are carrying out calculating the volatility of each coin month by month.
\begin{equation} Monthly\_volatility = \frac{Values(Close\_end\_of\_month)-Values(Close\_month\_before)}{Values(Close\_month\_before)}*100 \end{equation}last_dom_prep = cryp_df.reset_index('SNo').drop(columns=['SNo','Symbol'])
last_dom_prep['Month'] = last_dom_prep.Date.dt.month
last_dom_prep['Year'] = last_dom_prep.Date.dt.year
last_dom = last_dom_prep.groupby(['Name','Year','Month'])['Date'].max().reset_index()
#make condition
Name = (cryp_df['Name'].isin(last_dom['Name']))
date = (cryp_df['Date'].isin(last_dom['Date']))
#gat all cols of last date each month and each Name.
last_dom_df = last_dom_prep[Name & date]
last_dom_df=last_dom_df.drop(columns=['Month','Year'])
#find min date and max date of each coin => delete min date and duplicate max date.
min_max = last_dom_df[['Name','Date']].groupby('Name').agg([min,max])
min_max = min_max.reset_index(col_level=1).droplevel(level=0,axis=1).set_index(['Name','min'])
substr = last_dom_df.set_index(['Name','Date']).drop(index=min_max.index)
min_max = min_max.reset_index().set_index(['Name','max'])
besubstr = last_dom_df.set_index(['Name','Date']).drop(index=min_max.index).reset_index()
#increase to end of next month
besubstr = besubstr.assign(
Date = besubstr.Date + pd.Timedelta(days=1)
)
besubstr = besubstr.assign(
Date = besubstr.Date.apply(
lambda x:x+ relativedelta.relativedelta(months=1)
)
)
besubstr = besubstr.assign(
Date = besubstr.Date - pd.Timedelta(days=1)
)
besubstr = besubstr.set_index(['Name','Date'])
#calculating the volatility
res = ((substr - besubstr)*100/besubstr).reset_index()
res = pd.pivot(res, columns = 'Name',index='Date',values='Close')
res
| Name | Aave | Binance Coin | Bitcoin | Cardano | Chainlink | Cosmos | Crypto.com Coin | Dogecoin | EOS | Ethereum | ... | NEM | Polkadot | Solana | Stellar | TRON | Tether | USD Coin | Uniswap | Wrapped Bitcoin | XRP |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2013-05-31 23:59:59 | NaN | NaN | -7.194245 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-06-30 23:59:59 | NaN | NaN | -25.105428 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-07-31 23:59:59 | NaN | NaN | 9.808100 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-08-31 23:59:59 | NaN | NaN | 27.580366 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-09-30 23:59:59 | NaN | NaN | -1.736244 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 94.118035 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2021-04-30 23:59:59 | 17.962651 | 106.605778 | -1.983501 | 13.340260 | 29.904701 | 18.930354 | -8.210962 | 529.105892 | 33.862844 | 44.561196 | ... | -7.944032 | -1.313496 | 120.680681 | 28.870459 | 43.241110 | -0.001135 | 0.021877 | 45.418800 | -2.008534 | 177.358124 |
| 2021-05-31 23:59:59 | -13.861344 | -43.223781 | -35.354564 | 28.860927 | -15.840958 | -38.427119 | -36.641910 | -3.481047 | 3.160672 | -2.100881 | ... | -44.645530 | -35.877113 | -23.604979 | -23.817326 | -41.827906 | 0.052419 | 0.044035 | -30.291554 | -35.398383 | -34.246312 |
| 2021-06-30 23:59:59 | -34.639432 | -14.402834 | -6.139409 | -20.616898 | -39.239339 | -14.224586 | -8.173372 | -21.974458 | -37.777659 | -16.221234 | ... | -31.126707 | -29.997673 | 8.350011 | -29.853028 | -11.406481 | -0.031692 | -0.017570 | -32.045981 | -6.154983 | -32.506669 |
| 2021-07-06 23:59:59 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-07-31 23:59:59 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
102 rows × 23 columns
Visualizing the volatility of each coin to observe clearly. From now on, we can assume a coin is stable if its volatility is not beyond to $[-15,15]$ and the data is enough for our decisions.
# fig, axs = plt.subplots(int(res.columns.size/2 + 1),2, figsize=(15,60))
# fig.tight_layout()
# for i in range(res.columns.size):
# axs[int(i/2),i%2].plot(res[res.columns.values[i]])
# axs[int(i/2),i%2].set_title(res.columns.values[i])
fig = make_subplots(rows = int(res.columns.size/2 + 1), cols = 2,
vertical_spacing=0.02,
subplot_titles=tuple(res.columns.values))
for i in range(res.columns.size):
fig.add_trace(
go.Scatter(x = res.index,
y =res[res.columns.values[i]],
mode='lines',
name = res.columns.values[i]
),
row = int(i/2)+1, col = i%2 + 1
)
fig.update_layout(height=4000,
showlegend=False,
title_text="Volatility of each coin")
fig.show()
We could take a look at the vilatility year by year if we are a long-term investor.
All processes like above. We appoximates the yearly volatility base on monthly volatility.
Standard Deviation of monthly volatility.
\begin{equation} STD\_Monthly\_volatility=\sqrt{\frac{\sum{(vola\_month_i- \mu)^2}}{N=Num\_of\_month}} \end{equation}Anually volatility.
\begin{equation} Annual\_volatility=STD\_Monthly\_volatility \sqrt{Num\_of\_month} \end{equation}res1 = ((substr-besubstr)*100/besubstr).reset_index()
#groupby year
res1['Year'] = res1.Date.dt.year
res1 = res1[['Name','Year','Close']].groupby(['Name','Year']).agg([np.std,np.size])\
.reset_index(col_level=1)
res1 = res1.droplevel(level=0,axis=1)
#Monthly volatility = standard deviation of month-base volatility.
#Annual volatility = Monthly Volatility * sqrt(num time period)
res1['AnnualVola'] = res1['std']*np.sqrt(res1['size'])
res1 = pd.pivot(res1, columns = 'Name',index='Year',values='AnnualVola')
res1
| Name | Aave | Binance Coin | Bitcoin | Cardano | Chainlink | Cosmos | Crypto.com Coin | Dogecoin | EOS | Ethereum | ... | NEM | Polkadot | Solana | Stellar | TRON | Tether | USD Coin | Uniswap | Wrapped Bitcoin | XRP |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | |||||||||||||||||||||
| 2013 | NaN | NaN | 457.109181 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 539.563746 |
| 2014 | NaN | NaN | 66.556166 | NaN | NaN | NaN | NaN | 362.003329 | NaN | NaN | ... | NaN | NaN | NaN | 162.539517 | NaN | NaN | NaN | NaN | NaN | 199.281864 |
| 2015 | NaN | NaN | 66.726704 | NaN | NaN | NaN | NaN | 86.039260 | NaN | 59.311735 | ... | 96.102720 | NaN | NaN | 58.689299 | NaN | 0.000000 | NaN | NaN | NaN | 100.429210 |
| 2016 | NaN | NaN | 49.309806 | NaN | NaN | NaN | NaN | 95.502047 | NaN | 236.411589 | ... | 600.502428 | NaN | NaN | 93.278866 | NaN | 0.006333 | NaN | NaN | NaN | 65.282208 |
| 2017 | NaN | 1955.111950 | 95.957177 | 206.464997 | 290.086523 | NaN | NaN | 410.123269 | 320.308490 | 262.211431 | ... | 459.309983 | NaN | NaN | 688.910700 | 1892.737961 | 13.311229 | NaN | NaN | NaN | 869.089489 |
| 2018 | NaN | 73.193491 | 70.876683 | 158.615169 | 147.557427 | NaN | NaN | 151.124506 | 222.328876 | 124.953128 | ... | 122.565094 | NaN | NaN | 158.527085 | 203.551788 | 3.668578 | 0.795658 | NaN | NaN | 132.638788 |
| 2019 | NaN | 119.253488 | 77.494807 | 85.825074 | 267.470088 | 88.550594 | 272.493451 | 56.080850 | 113.331961 | 88.975236 | ... | 103.686407 | NaN | NaN | 69.377906 | 85.089444 | 2.223591 | 2.669780 | NaN | 76.247803 | 62.627851 |
| 2020 | 142.704238 | 72.601705 | 79.796929 | 141.519815 | 156.576276 | 127.325853 | 109.703283 | 76.319909 | 97.297391 | 103.716362 | ... | 192.908441 | 89.340022 | 242.260489 | 187.488532 | 85.112881 | 1.125439 | 3.647628 | 98.530297 | 81.547819 | 204.431611 |
| 2021 | 287.504510 | 430.664039 | 74.921018 | 318.187039 | 135.334441 | 150.046174 | 157.141160 | 901.397567 | 78.032515 | 98.454019 | ... | 215.813788 | 164.222431 | 266.540549 | 173.644518 | 141.507427 | 0.321865 | 0.118152 | 285.846007 | 75.285197 | 252.159734 |
9 rows × 23 columns
# fig, axs = plt.subplots(int(res1.columns.size/2 + 1),2, figsize=(15,60))
# fig.tight_layout()
# for i in range(res1.columns.size):
# axs[int(i/2),i%2].plot(res1[res1.columns.values[i]])
# axs[int(i/2),i%2].set_title(res1.columns.values[i])
fig = make_subplots(rows = int(res1.columns.size/2 + 1), cols = 2,
vertical_spacing=0.02,
subplot_titles=tuple(res1.columns.values))
for i in range(res1.columns.size):
fig.add_trace(
go.Scatter(x = res1.index,
y =res1[res1.columns.values[i]],
mode='lines',
name = res1.columns.values[i]
),
row = int(i/2)+1, col = i%2 + 1
)
fig.update_layout(height=4000,
showlegend=False,
title_text="Volatility of each coin")
fig.show()
Before deciding to invest a digital currency, investors must save their capital. How to save their own capital while almost digital currencies have a high volatility? $\Rightarrow$ They must find and reserve in a stable coin.
According to the result, we can consider which one is a stablecoin. Here we have the coins take least volatility whether investors is middle-term or long-term:
To see crypto currency evolution, we could use MarketCap col.
true_mcap = cryp_df[(cryp_df['Marketcap'] > 0) & (cryp_df['Date'].dt.year == 2021)]
mean_mcap = true_mcap[['Name','Symbol','Marketcap']].groupby(['Name','Symbol']).mean()
pd.set_option('display.float_format', '{:.2f}'.format)
mean_mcap
| Marketcap | ||
|---|---|---|
| Name | Symbol | |
| Aave | AAVE | 4342005247.88 |
| Binance Coin | BNB | 45863429127.74 |
| Bitcoin | BTC | 850308418885.51 |
| Cardano | ADA | 36073727647.25 |
| Chainlink | LINK | 11780371063.65 |
| Cosmos | ATOM | 3439341205.25 |
| Crypto.com Coin | CRO | 3452219867.56 |
| Dogecoin | DOGE | 23253147261.34 |
| EOS | EOS | 4767997966.08 |
| Ethereum | ETH | 239910890392.16 |
| IOTA | MIOTA | 3403257862.25 |
| Litecoin | LTC | 13150059074.47 |
| Monero | XMR | 4459267395.45 |
| NEM | XEM | 2850426717.84 |
| Polkadot | DOT | 25611960118.95 |
| Solana | SOL | 6176877049.63 |
| Stellar | XLM | 9270609681.84 |
| TRON | TRX | 5192913205.79 |
| Tether | USDT | 44067450019.42 |
| USD Coin | USDC | 12663720269.43 |
| Uniswap | UNI | 12035294126.12 |
| Wrapped Bitcoin | WBTC | 6764035608.35 |
| XRP | XRP | 33799135106.02 |
Visualyzing...
mean_mcap = mean_mcap.reset_index()
fig = px.pie(mean_mcap, values= 'Marketcap', names='Name',
title="Proportion of currency in 2021")
fig.update_traces(textposition = 'inside',textinfo='percent+label')
As a result, it is said that Bitcoin is the main part of crypto currency, and other coins are assumed to be alternative coins (called altcoins).
History price chart.
btc_df = cryp_df[cryp_df['Symbol'] == 'BTC'].drop(columns='Marketcap')
fig = make_subplots(cols=1, rows=2,
shared_xaxes=True,
vertical_spacing=0.05,
subplot_titles=('Bitcoin history price','Volume'))
fig.add_trace(go.Candlestick(x=btc_df['Date'],
open=btc_df['Open'],
close = btc_df['Close'],
high=btc_df['High'],
low = btc_df['Low'],
),
row=1,col=1
)
btc_df = btc_df[btc_df['Volume'] > 0]
fig.add_trace(
go.Scatter(x=btc_df['Date'], y=btc_df['Volume']),
row=2, col =1
)
fig.update_layout(height=1000,
xaxis_rangeslider_visible=False,
showlegend=False,
title='Bitcoin analysis')
fig.show()
After preprocessing and exploring, we had an overview about the dataset. Lets turn back to our purpose. There are 2 main questions we must answer:
With this question and the results of data exploratory, we could know how the digital currency work. The answer of this question tells us we should invest alternative coins or bitcoin.
We choose top_7 coins and stable coins except Bitcoin representing alternative coins.
stb_coin=['USDT','USDC']
mean_mcap = mean_mcap.sort_values('Marketcap',ascending=False)
stb_coin = mean_mcap[mean_mcap['Symbol'].isin(stb_coin)]
alt_and_BTC_coin = mean_mcap[:8]
alt_and_BTC_coins = pd.concat([stb_coin,alt_and_BTC_coin],axis=0)
alt_and_BTC_coins = alt_and_BTC_coins.drop_duplicates()
corr_df = cryp_df[cryp_df['Symbol'].isin(alt_and_BTC_coins['Symbol'])]
corr_df = pd.pivot(corr_df,values='Close', index='Date',columns='Symbol')
corr_df
| Symbol | ADA | BNB | BTC | DOGE | DOT | ETH | USDC | USDT | XRP |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2013-04-29 23:59:59 | NaN | NaN | 144.54 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-04-30 23:59:59 | NaN | NaN | 139.00 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-05-01 23:59:59 | NaN | NaN | 116.99 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-05-02 23:59:59 | NaN | NaN | 105.21 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2013-05-03 23:59:59 | NaN | NaN | 97.75 | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2021-07-02 23:59:59 | 1.39 | 287.42 | 33897.05 | 0.25 | 15.34 | 2150.04 | 1.00 | 1.00 | 0.66 |
| 2021-07-03 23:59:59 | 1.41 | 298.24 | 34668.55 | 0.25 | 15.55 | 2226.11 | 1.00 | 1.00 | 0.67 |
| 2021-07-04 23:59:59 | 1.46 | 307.73 | 35287.78 | 0.25 | 16.01 | 2321.72 | 1.00 | 1.00 | 0.69 |
| 2021-07-05 23:59:59 | 1.40 | 302.38 | 33746.00 | 0.23 | 15.24 | 2198.58 | 1.00 | 1.00 | 0.65 |
| 2021-07-06 23:59:59 | 1.42 | 320.93 | 34235.19 | 0.23 | 16.14 | 2324.68 | 1.00 | 1.00 | 0.67 |
2991 rows × 9 columns
Calculating correlation efficient.
corr_df.corr()
| Symbol | ADA | BNB | BTC | DOGE | DOT | ETH | USDC | USDT | XRP |
|---|---|---|---|---|---|---|---|---|---|
| Symbol | |||||||||
| ADA | 1.00 | 0.89 | 0.86 | 0.83 | 0.85 | 0.96 | -0.24 | -0.07 | 0.65 |
| BNB | 0.89 | 1.00 | 0.86 | 0.89 | 0.85 | 0.90 | -0.22 | -0.08 | 0.48 |
| BTC | 0.86 | 0.86 | 1.00 | 0.65 | 0.95 | 0.90 | -0.32 | 0.04 | 0.62 |
| DOGE | 0.83 | 0.89 | 0.65 | 1.00 | 0.57 | 0.84 | -0.17 | 0.00 | 0.49 |
| DOT | 0.85 | 0.85 | 0.95 | 0.57 | 1.00 | 0.84 | 0.02 | -0.08 | 0.71 |
| ETH | 0.96 | 0.90 | 0.90 | 0.84 | 0.84 | 1.00 | -0.27 | 0.03 | 0.70 |
| USDC | -0.24 | -0.22 | -0.32 | -0.17 | 0.02 | -0.27 | 1.00 | 0.23 | -0.11 |
| USDT | -0.07 | -0.08 | 0.04 | 0.00 | -0.08 | 0.03 | 0.23 | 1.00 | 0.07 |
| XRP | 0.65 | 0.48 | 0.62 | 0.49 | 0.71 | 0.70 | -0.11 | 0.07 | 1.00 |
eth_df = cryp_df[cryp_df['Symbol']=='ETH']
tether_df = cryp_df[cryp_df['Symbol']=='USDT']
fig = make_subplots(cols=1, rows=3,
shared_xaxes=True,
vertical_spacing=0.05,
subplot_titles=('Bitcoin historical price',
'Ethereum historical price',
'Tether historical price'
))
fig.add_trace(
go.Candlestick(x=btc_df['Date'],
open=btc_df['Open'],
close = btc_df['Close'],
high=btc_df['High'],
low = btc_df['Low'],
),
row=1,col=1
)
fig.add_trace(
go.Candlestick(x=eth_df['Date'],
open=eth_df['Open'],
close = eth_df ['Close'],
high=eth_df['High'],
low = eth_df['Low'],
),
row=2, col=1
)
fig.add_trace(
go.Candlestick(x=tether_df['Date'],
open=tether_df['Open'],
close = tether_df['Close'],
high=tether_df['High'],
low = tether_df['Low'],
),
row=3, col=1
)
for i in range(3):
fig.update_xaxes(row=i+1, col=1, rangeslider_visible=False)
fig.update_layout(height=1500,
showlegend=False,
title='Crypto analysis')
fig.show()
But, how to choose other coins for optimizing profits. We continue to the next question.
We can answer this question by being greedy in profits... which coins has the best profits.
Do High,Low,Open,Close cols have values been less than or equal to 0?
(cryp_df[['High','Low','Open','Close']] <= 0).sum()
High 0 Low 0 Open 0 Close 0 dtype: int64
Get max profits by dividing max of high to min of low for each coins.
\begin{equation}Profits\_of\_coins = \frac{Max(High)}{Min(Low)}\end{equation}calc_coins = cryp_df[['Name','Symbol','High','Low']].groupby(['Name','Symbol']).agg([np.min,np.max])
#Find max of High and min of Low
calc_coins = calc_coins.assign(
High_max = calc_coins['High']['amax'],
Low_min = calc_coins['High']['amin']
)
calc_coins = calc_coins.drop(columns=['High','Low']).droplevel(level=1,axis=1)
#Calculating profits
profits = pd.DataFrame(calc_coins['High_max'] / calc_coins['Low_min'])
profits = profits.rename({0:'Proportion'},axis=1)
profits = profits.reset_index().drop(columns='Symbol')
#
profits = profits.sort_values('Proportion',ascending=False)
profits
| Name | Proportion | |
|---|---|---|
| 13 | NEM | 23506.73 |
| 9 | Ethereum | 9032.01 |
| 7 | Dogecoin | 8269.84 |
| 1 | Binance Coin | 6826.65 |
| 12 | Monero | 2056.03 |
| 22 | XRP | 1246.49 |
| 2 | Bitcoin | 869.93 |
| 16 | Stellar | 621.70 |
| 4 | Chainlink | 335.85 |
| 11 | Litecoin | 307.08 |
| 17 | TRON | 164.07 |
| 3 | Cardano | 116.95 |
| 15 | Solana | 104.16 |
| 10 | IOTA | 45.62 |
| 8 | EOS | 44.13 |
| 0 | Aave | 23.45 |
| 6 | Crypto.com Coin | 21.74 |
| 20 | Uniswap | 20.82 |
| 21 | Wrapped Bitcoin | 18.70 |
| 5 | Cosmos | 17.72 |
| 14 | Polkadot | 16.08 |
| 18 | Tether | 2.00 |
| 19 | USD Coin | 1.11 |
Visualyzing...
plt.figure(figsize=(8,6))
plt.style.use('seaborn-whitegrid')
sns.barplot(x = profits['Proportion'], y = profits['Name'])
plt.title('Profits')
plt.xlabel('Ratio');
We can take consideration on NEM, Ethereum, Dogecoin, Binance Coin, Monero,... before actually investing.
Calculating risk based on the last day of Dataset collected:
\begin{equation} Risks=\frac{min(Low\_of\_last\_day)-max(High)}{max(High)}*100 \end{equation}max_date = cryp_df[['Name','Symbol','Date']].groupby(['Name','Symbol']).max()\
.reset_index().set_index(['Name','Symbol','Date'])
#get last days
max_date_df = cryp_df.set_index(['Name','Symbol','Date'])
max_date_df = max_date_df[max_date_df.index.isin(max_date.index)]\
.reset_index().set_index(['Name','Symbol']).drop(columns='Date')
#Take risks
risk = pd.DataFrame(
(max_date_df['Low'] - calc_coins['High_max'])*100/calc_coins['High_max']
).rename({0:'Risks'},axis=1).reset_index()
#sorting values
risk = risk.sort_values('Risks')
risk
| Name | Symbol | Risks | |
|---|---|---|---|
| 13 | NEM | XEM | -93.84 |
| 10 | IOTA | MIOTA | -85.58 |
| 8 | EOS | EOS | -83.41 |
| 22 | XRP | XRP | -83.01 |
| 17 | TRON | TRX | -78.69 |
| 16 | Stellar | XLM | -72.97 |
| 14 | Polkadot | DOT | -69.39 |
| 7 | Dogecoin | DOGE | -68.84 |
| 11 | Litecoin | LTC | -67.09 |
| 4 | Chainlink | LINK | -65.29 |
| 5 | Cosmos | ATOM | -60.10 |
| 12 | Monero | XMR | -58.99 |
| 6 | Crypto.com Coin | CRO | -57.29 |
| 1 | Binance Coin | BNB | -56.26 |
| 20 | Uniswap | UNI | -55.36 |
| 0 | Aave | AAVE | -53.81 |
| 9 | Ethereum | ETH | -49.62 |
| 2 | Bitcoin | BTC | -48.20 |
| 21 | Wrapped Bitcoin | WBTC | -47.99 |
| 15 | Solana | SOL | -43.52 |
| 3 | Cardano | ADA | -43.40 |
| 18 | Tether | USDT | -17.53 |
| 19 | USD Coin | USDC | -10.24 |
Visualyzing...
plt.figure(figsize=(8,6))
plt.style.use('seaborn-whitegrid')
sns.barplot(x = risk['Risks'], y = risk['Name'])
plt.title('Risks')
plt.yticks(rotation=45)
plt.xlabel('Percentage');
We can take consideration on NEM, IOTA,EOS, XRP... before actually investing.
We can easily observe that some coins having very good profits(like NEM can bring to investors $> 20000$ times based on their money in), but besides it, it could take the investors' money by substact $93.84%$ based on their money in.
So investors can pour out their money on the coins that take less risks, E.X: Solana or Bitcoin,...
**Note**:The answer to this question is for reference only